Visualize: Layers, Exploratory Data Analysis, Communication

Module 04

Ray J. Hoobler

Layers (ggplot2)

Asthetic mapping

What can be included in the arguments of the aes() function?

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>)
  )

Variables (quantitative or categorical) can be mapped to the following aesthetics:

  • x
  • y
  • color
  • fill
  • size
  • shape
  • linetype
  • alpha

Explore Aesthetics with the Palmer Penguins Data (1/6)

Code
library(palmerpenguins)
library(tidyverse)

ggplot(data = penguins) + 
  geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm))

Explore Aesthetics with the Palmer Penguins Data (2/6)

Code
ggplot(data = penguins) + 
  geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm, 
                           color = species))

Explore Aesthetics with the Palmer Penguins Data (3/6)

Code
ggplot(data = penguins) + 
  geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm, 
                           color = species, 
                           size = body_mass_g))

Explore Aesthetics with the Palmer Penguins Data (4/6)

Code
ggplot(data = penguins) + 
  geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm, 
                           color = species, 
                           size = body_mass_g, 
                           shape=island))

Explore Aesthetics with the Palmer Penguins Data (5/6)

Code
ggplot(data = penguins) + 
  geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm, 
                           size = body_mass_g, 
                           shape=island, 
                           alpha = 0.5),
             color = "blue")

Explore Aesthetics with the Palmer Penguins Data (6/6)

What’s happening here?

Code
ggplot(data = penguins) + 
  geom_point(mapping = aes(x = bill_length_mm, y = bill_depth_mm, 
                           color = "blue"))

Geometric Objects (1/2)

ggplot2 provides
more than 40 geoms!

Code
library(patchwork)

p1 <- ggplot(penguins, aes(x = body_mass_g)) + 
  geom_histogram()

p2 <- ggplot(penguins, aes(x =body_mass_g)) +
  geom_density()  

p3 <- ggplot(penguins, aes(x = body_mass_g)) +
  geom_boxplot()

p1 + p2 + p3

Geometric Objects (2/2)

Libraries are availabe for specialty plots (i.e., “pair plots”)

Code
library(GGally)

ggpairs(penguins, columns = 3:6, (aes(color = species, alpha = 1/4)))

Facets

The pair plot above is an example of a facet plot. We can create a facet plot by adding the facet_wrap() or facet_grid() function to the ggplot object.

Code
penguins |> 
  ggplot() +
  geom_histogram(aes(x = body_mass_g/1000, fill = species), bins = 30) +
  facet_wrap(vars(island), ) +
  theme(panel.spacing.x = unit(1, "lines"))

facet_grid() (1/3)

Code
penguins |> 
  ggplot() +
  geom_point(aes(x = bill_length_mm, y = flipper_length_mm, color=species)) +
  facet_grid(vars(sex), vars(island)) +
  theme(panel.spacing.x = unit(1, "lines"))

facet_grid() (1/3)

Code
penguins |> 
  filter(!is.na(sex)) |>
  ggplot() +
  geom_point(aes(x = bill_length_mm, y = flipper_length_mm, color=species)) +
  facet_grid(vars(sex), vars(island)) +
  theme(panel.spacing.x = unit(1, "lines"))

facet_grid() (3/3)

Code
penguins |> 
  filter(!is.na(sex)) |>
  ggplot() +
  geom_point(aes(x = bill_length_mm, y = flipper_length_mm, color=island)) +
  facet_grid(vars(species), vars(sex)) +
  theme(panel.spacing.x = unit(1, "lines"))

Facets using the ~ operator

Using the ~ operator to specify the rows and columns of the facet grid.

Code
penguins |> 
  filter(!is.na(sex)) |>
  ggplot() +
  geom_point(aes(x = bill_length_mm, y = flipper_length_mm, color=island)) +
  facet_grid(species ~ .) +
  theme(panel.spacing.x = unit(1, "lines"))

Using the Dot(.) in ggplot2 Faceting

Prompt for Claude

Can you explain the use of “.” in R? Specifically, as it’s used in ggplot2 for functions like facet_grid(. ~ species)

Response from Claude

The dot (.) in R, especially as used in ggplot2 functions like facet_grid(), has a special meaning. Let me explain its use:

  1. In ggplot2’s facet_grid():

    In the context of facet_grid(. ~ species), the dot serves as a placeholder. It indicates that you don’t want to split the plot on the rows (left side of the tilde ~), but you do want to split it on the columns (right side of the tilde) based on the “species” variable.

    • facet_grid(. ~ species): Creates separate plots for each unique value in the “species” column, arranged horizontally.
    • facet_grid(species ~ .): Would create separate plots for each unique value in the “species” column, arranged vertically.
    • facet_grid(var1 ~ var2): Would create a grid of plots, with “var1” determining the rows and “var2” the columns.
  2. In other R contexts:

    The dot can also be used in other situations in R:

    • In pipe operations (%>% or |>), the dot represents the object being passed through the pipe.
    • In formulas, it can represent “all other variables” not explicitly mentioned.
    • In function definitions, it can be used to pass additional arguments.

Assignment

Chapter 9 9.4.1 Exercises. Questions 1 through 7

Chapter 9 9.3.1 Exercises, questions 1 through 4

Statistical Transformations (1/2)

Several geometric objects in ggplot2 carry out statistical transformations, examples include:

  • geom_histogram() calculates the count of observations in bins.
  • geom_density() calculates the density of a variable.
  • geom_bar() calculates the count of observations for each level of a categorical variable.
  • geom_boxplot() calculates the median, quartiles, and outliers of a variable.
  • geom_smooth() calculates a smoothed line (or fit) through the data.

Figure 9.2 from R4DS 2e provides the following graphic:

Statistical Transformations (2/2)

For every geometric object geom_*, there is a corresponding statistical transformation stat_*.

Additional transformations can be called to override the default values. For instance, the geom_histogram() includes the following:

Computed variables

These are calculated by the ‘stat’ part of layers and can be accessed with delayed evaluation.

  • after_stat(count)
    number of points in bin.

  • after_stat(density)
    density of points in bin, scaled to integrate to 1.

  • after_stat(ncount)
    count, scaled to a maximum of 1.

  • after_stat(ndensity)
    density, scaled to a maximum of 1.

  • after_stat(width)
    widths of bins.

Code
penguins |> 
  ggplot() +
  geom_histogram(aes(x = body_mass_g, y = after_stat(density), fill = species), 
                 bins = 30, show.legend = FALSE) +
  theme(
    axis.title.y = element_text(size = 24, color = "red", face = "bold")
  )

Position Adjustments (1/6)

Code
machine <- read_table("datasets/MACHINE.DAT", 
                      skip = 25, 
                      col_names = c("machine", "day", "time", "sample", "diameter"), 
                      col_types = "ffffd")

machine

Position Adjustments (2/6)

Code
machine |> 
  ggplot() +
  geom_point(aes(x = day, y = diameter, 
                 color = machine))

Code
machine |> 
  ggplot() +
  geom_point(aes(x = day, y = diameter, 
                 color = machine), position = "jitter")

Position Adjustments (3/6)

Code
machine |> 
  ggplot() +
  geom_point(aes(x = day, y = diameter, color = machine), 
             position = position_jitter(width = 0.1))

Code
machine |> 
  ggplot() +
  geom_point(aes(x = time, y = diameter, color = machine), 
             position = position_jitter(width = 0.1))

Position Adjustments (4/6)

Code
machine |> 
  mutate(machine_time = str_c(machine, time, sep = "_")) |> 
  ggplot() +
  geom_point(aes(x = machine_time, y = diameter, color=machine), position = position_jitter(width = 0.1))

Position Adjustments (5/6)

Code
machine |> 
  ggplot() +
  geom_point(aes(x = machine, y = diameter, color = time), position = position_dodge(width = 0.5))

Position Adjustments (6/6)

Code
machine |> 
  ggplot() +
  geom_point(aes(x = machine, y = diameter, color = time), position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.5))

The Layered Grammar of Graphics

```{r}
ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION>
```

Exploratory Data Analysis (EDA)

EDA with the MACHINE.DAT Data (1/3)

Code
machine |> 
  group_by(machine, time) |>
  summarise(mean_diameter = mean(diameter), 
            sd_diameter = sd(diameter), ci_diameter = 1.96 * sd(diameter) / sqrt(n())) |>
  ungroup() |>
  mutate(machine_time = str_c(machine, time, sep = "_")) |>
  ggplot() +
  geom_point(aes(x = machine_time, y = mean_diameter, color = machine), 
             position = position_dodge(width = 0.5)) +
  geom_errorbar(aes(x = machine_time, ymin = mean_diameter - ci_diameter, ymax = mean_diameter + ci_diameter), 
                width = 0.25, position = position_dodge(width = 0.5))

EDA with the Palmer mpg Data (2/3)

Code
mpg

Communiation

End of Module 3

References